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Abstract 

Bayesian hierarchical methods implemented for small area estimation focus on 
reducing the noise variation in published government official statistics by borrow¬ 
ing information among dependent response values. Even the most flexible models 
confine parameters defined at the finest scale to link to each data observation in a 
one-to-one construction. We propose a Bayesian multiresolution formulation that 
utilizes an ensemble of observations at a variety of coarse scales in space and time 
to additively nest parameters we define at a finer scale, which serve as our focus 
for estimation. Our construction is motivated by and applied to the estimation of 
1— year period employment levels, indexed by county, from statistics published at 
coarser areal domains and multi-year intervals in the American Community Survey 
(ACS). We construct a nonparametric mixture of Gaussian processes as the prior 
on a set of regression coefficients of county-indexed latent functions over multiple 
survey years. We evaluate a modified Dirichlet process prior that incorporates 
county-year predictors as the mixing measure. Each county-year parameter of a 
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latent function is estimated from multiple coarse scale observations in space and 
time to which it links. The multiresolution formulation is evaluated on synthetic 
data and applied to the ACS. 

Key words: Survey sampling, Gaussian process, Dirichlet process, Bayesian hierarchi¬ 
cal models, latent models, Markov Chain Monte Carlo 

1 Introduction 

The Local Area Unemployment Survey (LAUS) program of the U.S. Bureau of Labor 
Statistics (BLS) publishes employment and unemployment levels for all counties and 
municipal civil divisions (MCDs) (each of which nests within a county) across ah states in 
the U.S. The LAUS program uses by-county and MCD published employment statistics 
from the American Community Survey (ACS) to compute local allocation proportions 
of state employment levels. The ACS is a national survey, conducted annually by the 
U.S. Census Bureau (Census), that replaces the information formerly published in the 
decennial census long-form. The LAUS program apply these local allocation proportions 
to published by-state employment estimates from the Current Population Survey to 
render the local estimates of employment. 

The ACS publishes sampling-weighted “direct estimates” (which we denote with the 
term, ’statistics’). (Direct estimates weight the response value for each household in the 
sample back to the population from which it is drawn by using a sampling weight that is 
inversely proportional to its inclusion probability to compose a total or mean statistic for 
each domain and time period of interest.) Employment statistics are published at 1—, 3— 
and 5— year time intervals (which we denote as “periods”) for each of a wide variety of 
geographic domains. The longer time periods enable the collection and pooling of more 
household samples to improve the estimation precision or coefficient of variation (CV); 
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hence, each period statistic corresponds to a single time interval computed from the 
total sample collected during that period. In addition to pooling household observations 
across years into multi-period intervals, the ACS also aggregates counties into larger 
geographic domains, such as metropolitan or micropolitan areas, to achieve a larger 
sample size that allows publication of 1— year period statistics. Census determines 
which periods and geographic domains to publish statistics in the ACS based on the 
supporting population size in each geographic domain in order to ensure an acceptable 
CV; for example, 1— year period statistics are published for all geographic domains 
with populations > 65000, while 3— year period statistics are provided for populations 
> 20000 and 5— year period statistics are otherwise provided. A domain for which 
1— year period statistics are published will also have published 3— and 5— year period 
statistics, while a domain for which 3— year period statistics are published will also have 
published 5— year period statistics. Most counties and MCDs in the U.S. are relatively 
small, such that only 26% of all counties have published ACS 1— year period statistics. 

In order to apply a consistent proportion-based allocation scheme across all counties 
and MCDs, the LAUS program is forced to use the 5— year period statistics, which are 
published annually. While new sample observations are added to the 5— year published 
statistics with each year, the resulting pooled, multi-year interval statistic is lagged and 
possibly overly smoothed, which may result in a failure of the allocation proportion 
scheme to capture near-term changes in economic conditions, such as the recent Great 
Recession, which may dramatically alter the estimated proportions from one year to the 
next. Our inferential goal in this paper is to develop a modeling approach that will 
utilize the published ACS statistics provided at these varied time periods and spatial 
domains to estimate latent, 1— year period values for all counties and MCDs, such that 
the LAUS program may employ these model-based 1— year period estimates to construct 
their local allocation proportions for all counties and MCDs in lieu of 5— year period 
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ACS statistics. 


Bayesian hierarchical modeling is extensively nsed in small area estimation applied 
to snrvey direct estimates pnblished as official statistics by government agencies with 
the goal to rednce estimation nncertainty by borrowing information among parameters 
indexed by spatial area and often time period ( Ghosh et al.|1998 ). The nse of hierarchical 
modeling facilitates the borrowing of estimation strength by shrinking all or some snbset 
of domain-period parameters to a common mean. Those domain-periods with higher 
(known) variances (dne to a relatively lower nnmber of observations nsed to compose 
the pnblished direct estimate) are shrnnk to a greater extent towards the common valne 
for the applicable snbset of domains. 

Even the most sophisticated small area modeling approaches, however, parameterize 


each regression mean to be linked one-to-one with an observed data point (Hawala and 


Lahiri 2012). These models may not be nsed to extract denoised, single year estimates 


for over 74% of those connties and MCDs that don’t have available 1— year period ACS 


statistics. While the recent work of Bradley et al. (2014) appears to develop estimates 
for small domains from larger ones, they allocate or apportion larger domain estimates. 
They don’t attempt to estimate latent valnes for hner areas nested within coarser ones 
that are viewed to generate the observed coarse estimates. 

We introdnce a Bayesian approach that constrncts parameters to be indexed on a 
hne scale and nest within one or more coarse-level observations in space and time. Onr 
approach employs mnltiple coarse-level observations, each of which provide some infor¬ 
mation abont a hne-level parameter that nests within it. We will see in the seqnel that 
the parameters represent de-noised connty-level employment levels and are constrained 
to snm to the mean of each ACS pnblished data point of the domain and time period that 
nest the connties represented by the parameters. There are often mnltiple 1— and/or 
3— year period statistics pnblished for these coarser spatial domains that may be nsed 
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to provide some information about the counties which exhaust them. 

Our approach also leverages the nesting of years within (multi-year) periods; for 
example, we use the 2008 — 2012 ACS publications, which will provide three, 3— year 
period statistics (e.g. 2008 — 2010, 2009 — 2011, 2010 — 2012). In the case where the 
ACS publishes 3— year period statistics for county “A”, the parameter dehned for 2010 
in county A would link to (or nest within) all three statistics. 

We employ a flexible nonparametric mixture approach for estimation of regression 
coefficients used to construct county-by-year parameters of each function, which allows 
the data to shrink estimated posterior distributions of the functions towards sub-group 
means. This data-induced dimension reduction permits identihcation of the functions 
estimated from the coarser set of statistics that nest them. We refer to our approach as 
a “multiresolution” formulation because it utilizes observations dehned at varied areal 
or time period resolutions for estimation of the by-county functions. 

We specify the parameterization for our multiresolution likelihood and construct our 
associated nonparametric model for estimating their parameters in Section A brief 
overview of our algorithm to sample the set of full conditional posterior distributions 
dehned by our model is discussed in Section We present estimated results for the 
collection of county/MCD-year parameters from the ACS in Section We perform a 
simulation study to assess the accuracy of the ACS estimates in Section and oher a 
concluding discussion in Section 

2 Method 

We begin exposition of our model formulation that will provide hne-scale, 1— year period 
employment estimates for all counties and MCD domains by introducing their parameter¬ 
ization and how they connect to the statistics published at coarser scales in a likelihood 
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statement. We will subsequently introduce the nonparametric prior distributions that 
specify our probability model. 

2.1 Multiresolution Parameterization 

In the discussion to follow, we will use “county” as a generic label to denote county 
and municipal civil division, the latter of which is primarily dehned as a New England 
township designation where MCDs are nested within counties. Let fij denote the (latent) 
employment level for £ = !,..., (A^ = 4751) counties over years, j = 2008,..., 2012. The 
counties are nested in larger core-based statistical areas (CBSAs), such as metropolitan 
(metro) and micropolitan (micro) areas, combinations of those larger areas (called core 
statistical areas or CSAs), including balance of states that subtract out all larger CBSAs 
and CSAs from each state. Larger states generally have both metro and micro areas, 
as well as larger combinations of these. (Census dehnes all CBSAs and CSAs to fully 
nest within a state). Smaller states may have only one-to-a-few micro areas and no 
larger CSAs, other than the balance of state estimate that subtracts away the micro 
areas. We denote all areas that geographically nest counties (which includes the counties, 
themselves) by the term “block” , b = 1,... ,B and all counties nest in one or more blocks. 
We use published statistics for B = 6074 ACS blocks (that include the N = 4751 
counties). Figure presents a distribution for the number of block links of the set 
of N counties, from which we note that most counties link to 4 — 6 blocks (including 
themselves). Multiple block linkages occur because a county may nest within a block 
which is, in turn, nested within other blocks. Figure [^presents an example for Amesbury 
Town, Massachusetts, which links to 4 other blocks through successive nestings. We 
index the multi-year periods hy q = 1,... ,Q, where each index value links a particular 
set of years. Table presents the set of years, j, (indexing the columns) that link with 
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Number of Linked Blocks 


Figure 1: Histogram of the number of block linkages for the N = 4734 counties. The 
linkage counts includes the self-linkage. 



1. AmesburyTown 


2. Essex County 


3. Boston-Cambridge- 
Quincy Metro Area 

4. Peabody Metro 
Division 

5. Boston-Worcester- 
Manchester, MA-RI- 
NH CSA 


Figure 2: Example of Block Nesting Structure for Amesbury Town, Massachusetts. 


each period (row), g, where 1 denotes a link and 0, not. 

We may create a simple likelihood statement for each block-period statistic, i/bq, 
based on those counties, (£), that nest in block, b and those years, (j), that nest in 
associated period, q, with. 


W, “ A'' 

(1) 

\ieb jeq ) 


fij = 

(2) 
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Table 1: Period, q = 1,..., {Q = 9) links to years, j = 2008,..., 2012 


Period, q 

Year 

2008 

2009 

2010 

2011 

2012 

1 

1 

0 

0 

0 

0 

2 

0 

1 

0 

0 

0 

3 

0 

0 

1 

0 

0 

4 

0 

0 

0 

1 

0 

5 

0 

0 

0 

0 

1 

6 

1 

1 

1 

0 

0 

7 

0 

1 

1 

1 

0 

8 

0 

0 

1 

1 

1 

9 

1 

1 

1 

1 

1 


where the associated block-period variances, are known. We observe that the 

(fij) are constrained to sum to the de-noised mean of each observation, ybq, which 
nests the associated counties and years. A P x 1 county-year set of predictors, is 
incorporated into the model for the function, f^j, with associated P x 1 coefficients, 
f3ij. We construct with an intercept and a set of predictors dehned at the county- 
year level available from administrative data. The Quarterly Census of Employment 
and Wages (QCEW) is a census instrument targeted to business establishments (rather 
than households targeted by the ACS) that collects employment levels (on a monthly 
basis), which we aggregate to county and year. Our QCEW county-year predictors 
are employment levels for 12 “super sectors” dehned in the North American Industry 
Classihcation System (NAICS): 1. Agricultural; 2. Natural resources and mining; 3. 
Construction; 4. Manufacturing; 5. Trade, transportation, utilities; 6. Information; 7. 
Financial activities; 8. Professional and business services; 9. Leisure and hospitality; 
10. Other services; 11. Public Administration; 12. Unclassihed. We intend these 




12 predictors, together, to describe the composition of the economic activity for each 
county, by year, which we believe may provide a root-cause driver for employment level 
statistics. We also include state records of unemployment claims aggregated to counties 
in our predictor set as a measure of economic health. Our predictors will be critical to 
identify the regression coefficients and to regulate the borrowing of information for their 
estimation (through shrinkage). 

We next dehne the prior distributions that permit flexibility in the borrowing of 
information for shrinkage in the estimation of the county-year regression coefficients. 


2.2 Prior on Functions 


The parametrization of Equation collects the P x T matrix of coefficients, = 
(/^a, • • • 5 At), indexed by county, i = on which we impose a conditional 

matrix variate Gaussian prior. 


~ 0 -|- AfpxT C , 


(3) 


under the notation of Dawid (1981), where the P x P, Ky^i represents the precision 
matrix for the set of P x 1 columns of B^ and the T xT, C(Kf), denotes the covariance 
matrix for the rows of B^. The county-indexed covariance matrix, C^, is parameterized 
by k,£. This specihcation is equivalent to the TP x TP covariance matrix constructed 
as ® C(k£) under a multivariate Gaussian prior on the vector obtained by stacking 
the rows of B^. The separable or tensor form we use for the covariance matrix reflects 
parsimony relative to a general TP x TP covariance matrix. Yet, our parameterization 


for the latent functions is more flexible than that Hawala and Lahiri (2012) who dehne 


fij ~ A/'(m£ + x^j A, <7^) (and each fij is linked, one-to-one, to observation, yej, differently 
from our multiresolution construction, such that their model may not be employed to 
extract county-level, 1— year period estimates from the AGS). 


9 








We fix a particular county, and introduce the Gaussian process covariance formula¬ 
tion we construct for each of the P, T x 1 rows of = (/3£i,..., f3ip) . The parameters, 
Hi, are used to specify a covariance formula for each cell of C(k£). Selecting (the T x 1) 
row, p, of B^, the covariance formula is specihed with. 


C (Ke) = Ci- 


C. 


0ipj i^ipk 


= — 1 1 + 


(tij tik) 


i,fce( 2008 ,..., 2012 ) 

2 \ —'^^,3 


where Ke^ 2 , ^^, 3 ); which parameterizes a rational quadratic covariance formula. 

The rational quadratic covariance formula may be derived as a scale mixture (over k) 


of more commonly-used squared exponential kernels, I/ki exp {{tj — tip/n) (Rasmusen 


and Williams 2006). The vertical magnitude of surfaces rendered from a GP with the 


rational quadratic covariance formula is directly controlled by while K (,^2 controls the 
mean length scale or period, and s controls smooth deviations from the mean length 
scale. Our choice of the rational quadratic covariance formula is intended as a parsi¬ 
monious specihcation for parameterizing the use of a single covariance matrix, rather 
than utilizing a sum or product of multiple covariance matrices, each under the simpler 


squared exponential covariance formula. See Savitsky et al. (2011) for more background 
on the Gaussian process covariance formulations. Our GP prior, parameterized by the 
T xT covariance matrix, C(k£), under a rational quadratic formulation produces rows of 
B^ that are inhnitely smooth (because they are differentiable at all orders), which will in 
turn, produce a smooth estimation for the T x 1 de-noised function, f^. The smoothness 
restriction helps separate signal captured in from the rough, non-differentiable noise 
in the observations, {i/bq), to which is linked. We believe this smoothness assumption 
is reasonable to separate signal from noise present in the AGS statistics and rely on it 
to help identify the regression coefficients. The P x P precision matrix, allows the 
data to estimate a dependence among the P sets of T x 1 functions, each drawn from 
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the Gaussian process. 


2.3 Clustering the Distributions of the Coefficients, 

Define where we note that the indexing by county, i = 

in Equation instantiates a marginal mixture (of matrix variate Gaussians) prior for 
(Bi,..., Bat). We will next define a non-parametric prior distribution for 0^ that will 
allow the data to estimate probabilistic clusters, such that those counties, {£}, whose 
(©.) are assigned to the same cluster will draw their coefficients, (B^), from the same 
Gaussian mixture component. We (probabilistically) cluster the parameters of the Gaus¬ 
sian prior that generates each B^, rather than directly clustering the set of (B^), because 
we don’t expect any of the coefficients (and associated T x 1 functions, (f^)) to be exactly 
equal. Rather, we expect subsets of functions to be “similar”, which we define as drawing 
their coefficients (assigned to same cluster) from the same Gaussian distribution. 

We specify a Dirichlet process prior for (Q^) in. 


© 1 ,..., © 


N 


Gr^G 


G 


a. Go ~ DP(a, Go), 


(4a) 

(4b) 


where ^ receive a random distribution prior, G, drawn from a Dirichlet pro¬ 

cess (DP), parameterized with a concentration parameter, a, a precision parameter that 
controls the amount of variation in G around prior mean. Go. The base or mean dis¬ 
tribution, Go = W (P -|- 1, Ip) X ^ dimensional Wishart distribution 

for the P X P, Ay £, and a product of Gamma priors for the D = 3 parameters in the 
rational quadratic specification for the parameters, k, that parameterize the T x T co- 
variance matrix, C, respectively. Equation describes a mixture model of the form, 
B|G ~ / 0 -I- AfpxT (Ay, C (k)) G (d(Ay, k)), where G is the mixing measure over the 
precision and covariance parameters, © = {Ay, k}. 
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The DP formulation may be described as approximating any unknown distribution 
by placing spikes at “location” values in the support of G, which are each drawn from 
Go, with heights equal to probability mass values associated to the locations, such that 
draws from G are almost surely discrete. The discrete construction for G allows for ties 
among the {&e) that we interpret as probabilistic clusters. We examine this clustering 
property of the DP by expressing it in the (stick breaking) form as a set of weighted 


locations (Sethuraman 1994), 


G - '^PhS&i, 


(5) 


h=l 


where G is a countably infinite mixture of weighted point masses with “locations”, 
©^,...,©)b, indexing the unique values for the (©£), where M < N (counties from 
the finite population). We record cluster memberships of counties with s = (si,..., sat) 
where sn = i denotes ©^ = ©^ so that {s, (©m)} provides an equivalent parameter¬ 
ization to (®.) and we recover ©£ = ©*^. The weight, G (0,1) is composed as 

Vh = Vh 11^=1 (1 ~ ^k) where Vh is drawn from the beta distribution. Be (1, a). This con¬ 
struction provides a prior penalty on the number of mixture components, but we also 
see that a higher value for a will produce more clusters (unique locations). Since each 
location is drawn from Go, as the number of unique locations increases, the estimated 
G approaches the base distribution, Gq. We place a further gamma prior on a to allow 
posterior updating in recognition of the relatively strong influence it conveys on the 


number of clusters formed (Escobar and West 1995). 


2.4 Predictor-Assisted Clustering 

We have, so far, specihed a likelihood linking subsets of county-year functions, {fij), to 
each of the block-period statistics, i/bq- The structure in our model is defined through the 
regression model on under the subsequent hierarchical prior formulation we 
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constructed for (B^). If we had imposed the DP prior directly on the (B^), the estimated 
functions would have been locally linear (for each subset of county-indexed coefficients 
assigned to same cluster), but globally non-linear. We dehned a nonparametric mixture 
prior for (B^) by placing the DP prior on the covariance parameters, 0£ = {^y/, of 
the Gaussian prior of Equation such that the estimated functions will be both locally 
and globally non-linear. 

The clustering of the counties is determined from the conditional distribution for 
Y = {iihq) \ ^ since we £x the predictors, (X^). Our estimation task is chal¬ 

lenging because we will not have a one-to-one relationship between most block-period 
observations, Y, and latent county-year parameters, (fij). So we would like to borrow 
the maximum amount of information provided in our data by incorporating the pre¬ 
dictor values into the computation of probabilities for the co-clustering of the county 
covariance parameters of {B^}. If the P x T matrix of predictors, X^, for county, £, is 
very similar to, X/, for county, then we would like to dehne a higher prior probability 
for ©£ = = ©)(j, in which case B^ is drawn from the same matrix-variate Gaussian 

as B^', producing function that is similar to f/. 


We modify an approach of Muller et al.| (2011) to allow dehnition of a DP prior 
construction that incorporates the predictors, into the determination of the 

clusters. We will treat the P xT predictor matrices, (Xi,..., X^v), as though they 
were random (though we believe they are not random) as a computational device to 
induce the utilization of the predictors, as well as the response, in the estimation of 
the clustering (or partition) over county-indexed covariance parameters, (©£). We next 
specify a probability model for the (X^) and show how we will use it in determination 
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of the cluster assignments, 



(6a) 

PxT. . /PxP TxT \ 

Ag ~ 0 + A/pxT f A^j, Q(a;, £) 

(6b) 

f*) Px,£ (Dx Px,l^x,t) I 

(6c) 


where ~ W (P + 1, Ip). Q(x, £) is constructed as a conditional autoregressive (CAR) 


prior (Rue and Held 2005) that is similar in idea to the GP prior on B^, but tends to 


render rough, non-differentiable surfaces, rather than the smooth surfaces generated by a 
GP prior. We use the CAR prior because it is computationally faster to draw posterior 
samples than the GP and we are not concerned with generating de-noised functions 
from X^, but only use the parameters of to help determining the clustering of the 
covariance parameters of B^. The T x T, D^,, is a diagonal matrix that sums the rows 
of the T X T, fix, a similarity or adjacency matrix between pairs of time points (with 
zeros for the diagonal values). So each entry in Dj, expresses the relative influence or 
precision for each time point. The parameter, Tx/ ~ Qa{a = l,b = 1), controls the 
scale and, px/ ~ U (—1,1), controls the degree of autocorrelation. The CAR prior may 
be heuristically thought of as a local, random walk smoother with a hxed length scale 


(unlike the GP, where the data estimate the length scale). See Savitsky and Paddock 


(2013) for more details about the CAR prior. 


We now extract {Ax„Tx„PxA and simply expand &i = ki, Ax/,Tx/, Px/} 

under the DP prior of Equation]^ which now incorporates information about X^ into the 
clustering of B^. To gain insight into how treating X^ as random influences the clustering 
mechanism, we present the kernel of the full conditional posterior distributions for the 


A^ X 1 vector of cluster indicators, s, after using the Polya Urn scheme (Sethuraman 
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1994) to marginalize out the random measure, G, 


oc 


/ (s£ — B^, Af, 0*) 

' 6(s, = s)L(B,,A,) 


( 7 ) 


a 


N + a 

, S(se = M- + l}L(B,,A,), 

that is a product of the mixture prior, /(s^ls.^) = ^^5 (s^ = s) + (s^ = M~ + 1) 

(which assigns counties to clusters with probabilities proportional to their popularity, as 
measured by the number of counties assigned to cluster s, and with probability propor¬ 
tional to a generates a new cluster) and the joint likelihood, 

L (B^, A^) = AfpxT C (k*)) TVpxt Q This computation 

reveals that the conditional posterior distribution for the cluster allocation of county £ 
is a function of both the likelihood of B^, estimated from Y = (ybg), and also that for 
A^, which is estimated from X^. So the use of the joint likelihood in the full condi¬ 
tional posterior for the allocation of counties to clusters demonstrates that the cluster 
assignments are now controlled by the joint distribution for (Y, (X^)). 

Miiller et al. (2011[) point out that is not necessary to believe the (X^) are random 


in formulation of Equation that relies on Equation to inject predictor information 
into the distribution over the clusterings (or partitions); rather, our assignment of a 
prior distribution to the (X^), as part of a joint model with Y, may be viewed as a 
computational device to implement a new prior distribution for the clusterings that 
incorporates the (X^). 

The joint prior for the cluster indicators, Si,... ,SAr, under simpler model of Sec- 
that parameterizes the conditional distribution for Y| (Xf)^^j^ is stated 


tion 


2.3 


with. 


/ ■ ■ ■ y ^n) oc Of 


M-1 


M 

n( 

m=l 




1 )!, 


( 8 ) 


after marginalizing out the random measure, G, where, Um = ^ — '^) denotes 
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the number of counties assigned to cluster, m. As earlier noted, this prior for cluster 
assignments is independent of the predictor values, (X^). 

Our formulation that parameterizes a joint distribution for Y, ^ is equiva¬ 
lent to the model for Y| but with Equation adjusted to add information 

about the predictors with, 

/ PxT \ ^ 

/ ( 5i,..., 5^1 Xi,..., X^ j oc n - !)!> (9) 

^ m=l 

where our notation conditions on the (X^) for emphasis, though this prior doesn’t treat 
them as random. In our mixture formulations, we dehne 


9(x;)= f n 

D.^ _ 


( 10 ) 


t.si=m 


with 0! 


= {A* ^ iT*miP*xm}- The form of slightly generalizes iMuller et ah 


(2011) from a DP to our DP mixture. Muller et ah (2011) highlight that it is not neces¬ 
sary for “similarity” function, gCX."^), to be specihed as random. It should be invariant 
to predictor labels and their scale, and assign larger probabilities of co-clustering where 
(X^ are closer in value. We use a symmetric random probability distribution 

which possesses these properties for computational convenience. 

The formulation of Equation is also equivalent to replacing the single random dis¬ 
tribution, G, with a collection, = Yl'h=iP^hd{A* , that indexes weights, {pxh), 
by the predictor values (such that, marginally, each Gx is a DP). Counties with similar 
predictor values are assigned a relatively higher prior probability of co-clustering. 


3 Posterior Computation 


We implement the posterior computations for the predictor-indexed mixture model. 


specihed in Section 2.4 (from which it is easy to derive the computations for the mix¬ 


ture model of Section 2.3), in a sequential scan of parameter blocks from their full 
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conditional posterior distributions in the growfunctions2 package for R(R Core Team 


2014), which is written in C++ for fast computation and available from the authors on 


request. We briefly highlight aspects of our posterior sampling algorithm for the major 
sets of parameters, below: 


1. Model for Y = {y^q) 


(a) Sample each block of P x T random effect coefficients, (B^), independently. 


using the elliptical slice sampler (ESS) of Murray et ah (2010) for block¬ 
sampling parameters under a multivariate Gaussian prior (that we general¬ 
ized to matrix variate Gaussian distributions). The ESS generates (P x T) 
proposals through a convex combination of a draw from the prior and the pre¬ 
viously sampled value. The proposals lie on the ellipse parameterized with a 


phase angle. The ESS uses a slice sampling algorithm (Neal 2000a) to draw 
proposals for the phase angle. Proposals are evaluated with the likelihood. 


L (B^) = n n •^ ( ^Iq 1 ) (11) 

be6(C '?eg(b) V i6q / 

where b{i) denotes the (usually multiple) blocks in which county i is nested. 
Similarly, q{b), denotes the often multiple periods, q, linked to block, b. We 
define = yhg — J2e'^e&b'A2j(^q^'i'jAi'j 1° subtract out estimated functions 
for all other counties, (£') ^ £, which are also linked to ybq- 


(b) Sample the posterior distribution for locations of the GP covariance in by¬ 
cluster groups, {i^*dm)d=i D 5 from the subset of counties, (B^), assigned to 
that cluster because i n* , for m m, a posteriori, in a Metropolis- 
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Hastings scheme using the following log-posterior kernel, 


log/ {B, : s, = m}) 

5^ c b;a;,„b, 

U: 

+ (a- l)log(Krf^) 


oc -^n^P log (|C I) - ^tr 


_t:si=m 


( 12 ) 


where (a, b) are shape and rate hyperparameters of a gamma prior, respec¬ 
tively, which are both set equal to 1. This posterior representation is a rela¬ 
tively straightforward Gaussian kernel of a non-conjugate probability model. 


We adapt a Metropolis-Hastings algorithm of Wang and Neal (2013) for sam¬ 
pling each that is designed to speed computation by introducing a lower¬ 
dimensional temporary space where the likelihood (e.g. the T x T, Gaussian 
process covariance matrix, C) is approximated using a subset of the T time- 
points. We develop a transition / proposal distribution based on composing 
moves in the lower dimensional, temporary space (using a slice sampler), 
where computations of the lower-dimensional GP covariance matrix are fast. 
If the lower dimensional approximations are relatively good, this approach 
will speed chain convergence by producing draws of lower autocorrelation 
since each proposal includes a sequence of moves generated in the temporary 


space for drawing an equivalent effective sample size. See Savitsky (2014) for 
more details. 


(c) Sample location. A* from a P dimensional Wishart posterior with degrees 
of freedom, rimT -|- (P -|-1) and P x P inverse scale, T.i:se=m B/C(k)„)B'-1 -Ip. 

(d) Sample cluster assignments, s = (si,..., sat), from their full conditionals 
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using the Polya urn representation, Blackwell and MacQueen (1973), 


/ (S£ = s\s_i, 0*, a, Te, B^, Ai) oc 


^LiBe,Ae) iil<s<M- 

ol/c* 


( 13 ) 


(B(, A,) it s = M + ft, 

where U-i^s = = s) is the number of counties, excluding unit £, 

assigned to cluster s, so that units are assigned to an existing cluster with 
probability proportional to its “popularity” and M~ denotes the total number 
of clusters when unit £ is removed (which is equal to M unless £ is a member 
of singleton cluster). The posterior assigns a county (through Si) to a new 
cluster with probability proportional to ado = f JV (BjK, ...) Go(dK), that 
requires the likelihood to be integrable in closed form with respect to the base 
distribution, which is not the case under our non-conjugate parameterization 
through the GP covariance matrix. So we utilize the auxiliary Gibbs sampler 


formulation of Neal (20006) and sample c* G N (typically set equal to 2 or 3) 


locations from base distribution. Go, ahead of any assigned observations, to 
dehne h = M~ + c* candidate clusters in an augmented space. We then draw 
from this augmented space, where any location not assigned units (over a 
set of draws for s) is dropped. 


2. Model for = (x^j) 


(a) Sample P xT, A^, independently, by stacking the transpose of the P , T x 1 
rows of Ac to form the PT x 1, 5y^e, from which we perform a draw from the 
following conjugate Gaussian posterior. 


/ p„,,|X,,H„s, Ap,,Q (y„,ppj) =Mpt (hj.cAp), 


(14) 


where we dehne PT x 1, with = (H^. (g) I'^), while Xy^c 

is formed by stacking the transpose of the rows of X^. Posterior precision. 
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(ps = iix,T + ® Q Finally, compose 

(b) Sample the location parameters, (r* , of the T x T CAR precision matrix, 

Q, from the Gamma distribution. 


/ = m),p*^„) = ea(ai,6i), 


( 16 ) 


with shape, Oi = O-Sn^F-P+a, and rate, bi = O.Str X]£:s(£)=m ^ 

where R;;, = (D,^, - p*mflx)- 

Next, sample p* ^ using a slice sampler with the following posterior evaluation 
kernel. 


log / (Px,ml(^^ -81 = 771), r* J 


cx 0.5n^Plog \R*J + 0.5r*^p;^^tr 


_i:s£=m 


(16) 


4 Results for the ACS 


Our likelihood of Equation sums the county-year parameters, (fij), nested in each 
block-period statistic, yi,q. Conversely, there are multiple statistics (indexed by block- 
period) that link to each county-year parameter, which provide some information to 
support the estimation of the that parameter. Figure [^presents a conceptual illustration 
for a hypothetical county, linked to a block, “6”, where block b, in turn, includes 
published observations for 3— and 5— year periods. Each row of Figure indicates 
with an “x”, the link of the associated period to the hve years of time points in our 
ACS dataset. Suppose we are interested to recover the associated statistics linked to the 
i — 2010 county-year for block b. The highlighted column for 2010 reveals there are hve 
statistics for block b that nest i — 2010 and provide some information for its estimation. 
There will potentially be many observed statistics used to estimate £ — 2010 in the case 
it nests in multiple blocks. 
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Figure 3: Conceptual Illustration of Multiple Data Points Linked to each County. 


We next illustrate estimation results by comparing the htted function for a selected 
county with the collection of statistics to which it is linked at each time point. To make 
the comparison meaningful, we only want to include the portion of each statistic that 
provides information about that county; for example, if a county is nested, along with 
other counties, in a metropolitan area for which we have an observed statistic, i/bq, we’d 
like to extract from the statistic only the portion of the observed employment level that 
provides information about that county. We compute a “pseudo” statistic, Vbq/j, in 


Equation 17 for each block, b, and period, q, linked to a latent, county-year function 


parameter, fij = by subtracting away from statistic, i/bq, (to which county-year, 

i — j, is linked) all other estimated county-year function values (besides that for i—j) for 
which ybq also provides information (including years {j*) other than j for county i). The 


quantity l3i*j* in Equation 17 represents the posterior mean of the sampled values from 


our MCMC. (Of course, coefficient values are sampled at each iteration of the MCMC 
under Equation for estimation. So we could rao-blackwellize over the posterior draws 
for the coefficient values to create a pseudo statistic, but it is less intuitive than our 
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proposed construction in Equation [Tt]) . 


Vbq/j — Vbq ^Ej* 

e*jEe&b jeq j*¥^j&q 


(17) 


Equation demonstrates that the posterior for each matrix of P x T coefficients, 
B£, weights the contribution of each statistic, ybq, in proportion to its precision (in¬ 
verse variance), such that statistics associated to block-periods closer in geography (that 
nests relatively fewer counties) and time exert more influence on the estimated result. 
Our presentation of results, to follow, will illustrate the fit mechanism by plotting each 
pseudo-statistic, ytq/j for county-year, i—j, with size of the displayed point in proportion 
to its precision. 

The next set of figures illustrate estimated functions for selected counties as compared 
to the associated pseudo statistics under the DP mixtures of Gaussian processes model 


of Section 2.3 We subsequently compare the £t performances for the clustering prior 


formulations of Sections |2.4| and |2.3| which include and exclude predictors, respectively, 
in the prior for cluster assignments. 

Figure displays the fitted function (in the pink line), along with the collections 
of pseudo statistics in each year for a county with 1— year period ACS observations. 
The size of each pseudo statistic is in proportion to its precision, with 1— year period 
points colored in red, 3— year period points in green and 5— year period points in blue. 
Since this county has observed 1— year period statistics, those will be the most precise 
(and, hence, largest) for estimating this county. Nevertheless, we see that while the 
htted trend is similar to that expressed by the 1— year estimates, it differs because 
the fitted values are influenced by pseudo statistics representing other blocks in which 
this county nests. These blocks provide additional information about employment levels 
for the county. While the htted values are more inhuenced by pseudo statistics that 
express higher precision, they are also inhuenced by the number of such points around a 
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given value. Here, we see a good coherence between the sets of 3— and 5— year period 
estimated pseudo statistics for blocks nesting this county in 2009 — 2011 (time points 
2 — 4). These values lie below the 1— year values and pull down the htted function away 
from the 1— year period estimate. 

We may not use these pseudo data plots to assess the £t quality, however, precisely 
because of the pseudo statistics are convolved with the estimation procedure. We may, 
nevertheless, comment on the coherence or closeness among estimated pseudo statistics 
with relatively larger precision values, which offers comment on the strength of estima¬ 
tion. 



Figured: Estimated Function vs. Pseudo Data for 1—year county: Fitted function (pink 
line) compared to the collection of pseudo data points in each year, 2008 — 2012, for a 
large-sized (by population) county, DuPage County, IL, with published 1— year period 
estimates. Each hollow circle represents a pseudo statistic and its size is proportional 
to its estimated precision. Each hollow circle is colored based on the period of the data 
point; red denotes a 1— year period, green denotes a 3— year period and red denotes a 
5— year period. 
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Figure displays the estimated function compared to pseudo statistics for a county 
with 3— and 5— year period observations, but not 1— period observations. We see a 
good coherence between estimated pseudo statistics among near in size blocks in which 
this county nests. Figure presents an MCD for which only a single 5— year period 



Figure 5: Estimated Function vs. Pseudo Data for 3— year county: Fitted function 
(pink line) compared to the collection of pseudo data points in each year, 2008 — 2012, 
for a medium-sized (by population) county, Lawrence County, SD, with published 3— 
year (but not 1— year) period estimates. Each hollow circle represents a pseudo statistic 
and its size is proportional to its estimated precision. Each hollow circle is colored based 
on the period of the data point. 

estimate is available. The results also express a good coherence between the relatively 
higher precision pseudo statistics because every New England MCD nests in a county, 
which in this case also has 1— year period statistics. 

We observe in these hgures that some of the pseudo statistics are very large in magni¬ 
tude - highly positive or negative - though their small precisions result in their exerting 
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little-to-no influence in the estimation of the functions. The overly high magnitude val- 


Hadley town, Hampshire County, Uassachuselts 




Figure 6: Estimated Function vs. Pseudo Data for 5— year county: Fitted function 
(pink line) compared to the collection of pseudo statistics in each year, 2008 — 2012, 
for a small-sized (by population) township (MCD), Hadley, Hampshire County, MA, 
with published 5— year (but not 1— or 3— year) period estimates. Each hollow circle 
represents a pseudo statistic and its size is proportional to its estimated precision. Each 
hollow circle is colored based on the period of the data point. 

ues occur where a county is nested in an area far different in size than itself; e.g. nested 
in a balance of metropolitan areas, which will potentially include hundreds of counties. 
While a state-level estimate may be relatively precise for estimating a large, state-level 
quantity, it is highly imprecise for estimating a small, constituent piece. Thus, there is 
almost no information borrowed from a block that is far larger in size than a constituent 
county, reflecting a limitation in the ability of the model to borrow information. 

In general, we hnd that the QCEW super sector employment level predictors helps to 
identify the county-year functions by providing magnitude information and regulating 
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the shrinkage of by-county regression coefficients where the county employment levels 
span vast differences in size of their populations and labor markets. Yet, the resulting 
modeled estimate is typically quite different in level and trend (not shown) than the 
total of the QCEW super sector employment values. We are not surprised because the 
QCEW provides place-of-work employment from establishments, while the ACS is a 
household survey providing place-of-residence employment. 

Our estimation model entirely focuses on estimating hne-level, county-year param¬ 
eters, using blocks and periods that nest them. Nevertheless, we’ve seen that there is 
limited information provided to estimate county by a block observation nesting it which 
is much larger (in population and employment) than the county. So, since 74% of coun¬ 
ties lack 1— year period estimates, a question arises about the quality of estimation at 
the state level composed by summing over the county-year parameters nested in each 
state-year. The roll-up of estimated functions to the states produces estimates for all 
states that are within 1 — 2% of 1— year period state-level estimates in the ACS. Fig¬ 
ure shows the estimated summed functions compared to the observed data points for 
three randomly-selected states, which illustrates the estimation of latent functions at the 
county-year level provides a good estimation for state-level, 1— year period observations. 


4.1 Assessment of Fit Quality 

We may not directly assess the £t performance of the estimated county-year functions for 
3— and 5— year counties due to the absence of observed 1— year data values. An indica¬ 
tion of £t quality may, however, be provided by excluding the (hve) 1— year data values 
for a county with available 1— year data values and comparing how the models - that 
exclude or include predictors in the prior distributions for cluster assignments - estimate 
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Figure 7: County-year fitted values summed to state-level (pink line) versus data values 
(hollow circles) for randomly-selected states. 


the county-year function to when the 1— year values are included. Figure [^presents esti¬ 
mated county-year functions for Craven County, North Carolina. The top panel displays 


estimated results under the predictor-assisted clustering model of Section |2.4[ while the 
bottom panel displays the same under the model that excludes predictors (in the prior 


for assignment to clusters) of Section 2.3 The solid, pink line in each plot panel presents 
the posterior mean fitted function when excluding the 1— year data values, while the 
dashed, blue line presents the same when including the 1— year values. The gray shading 
displays the associated 95% credible intervals under exclusion of the 1— year data values 
and the associated pseudo statistics are also constructed using the fitted functions under 
exclusion of these values. Finally, the pink, diamond points display the 1— year data 
values. 

We explored a number of 1— year counties, at random, and found a high-degree of 
similarity between the estimated county-year functions with and without inclusion of the 
1— year data values under both models. The model excluding predictors in the prior 
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for the cluster assignments of Section 2.3, however, tends to consistently express slightly 
less difference in estimated functions with and without inclusion of the 1— year data 
values. We present Craven County as something of a worst-case result that provides 
clearer differentiation between the performances of the two models. Craven County is a 
relatively small, 1— year county. The Craven County 1— year data points would suggest 
increasing employment through the Great Recession period of 2008 — 2010, which is 
antipodal to most counties in North Carolina (and the U.S., as a whole). The estimated 
employment trend when including the 1— year data values, which is displayed in the 
dashed blue line, actually estimates an employment decline from 2008 — 2009, followed 
by a recovery in 2009 — 2010. The other blocks (in addition to the county, itself) that 
include Craven County favor a decline - recovery trend, as may be observed in the 
associated co-plotted pseudo statistics. The Craven County estimation scheme balances 
the (higher precision, more reliable) 1— year data values with the information conveyed 
by the blocks at multiple resolutions in which Craven County nests. 

We see that both models amplify the estimated employment decline from 2008 — 2009 
when the 1— year data values are excluded, which effectively increases the influence of 
the other blocks containing Craven County. Yet, the model excluding predictors in 
assigning clusters well-captures both the increasing trend from 2010 — 2011 and the 
decreasing trend from 2011 — 2012. The predictor-assisted clustering model expresses 
a slightly steeper decline, followed by a more rapid recovery. It is likely the case that 
our predictors, which intend to measure the composition of the economic activity of a 
county, induced co-clustering among counties with this pattern during the Great Re¬ 
cession. The htted results under both models may be sensitive to the composition of 
the county-year predictors because they are below the resolution of the observed data; 
for example, perhaps if we include additional predictors that provide information about 
poverty concentration or education achievement the predictor-assisted model may or 



may not out-perform. In any case, given the estimation sensitivity to predictor values, 
they should be carefully chosen based on their ability to comment on the economic con¬ 
ditions of each county. These results generally suggest that the spatial and temporal 
nesting construction that underpin our models may provide reasonable estimates across 
counties. The larger credible intervals for the predictor-assisted clustering model reflects 
the large space of partitions or clusterings induced when including the predictors in the 
prior for the mixing measure. 


Table 4.1 provides £t statistics for the models including ((y, X)) and excluding pre¬ 


dictors ((y|X)) in the estimation of clusters. We display the DICz criterion (Celeux 


et ah 


2006) that focuses on the marginal (predictive) density / (y) in lieu of /(y|parameters). 


which is more appropriate for mixture models. Also shown is the log-pseudo marginal 
likelihood that employs “leave-one-out” cross-validation (Gelfand and Dey 1994[ ). We 
estimate nf=?i / {vAy-r-, (where r denotes a block-period case observation), the log 
of which is the log pseudo marginal likelihood (LPML), where Mk indexes a model. We 
employ a weighted re-sampling of parameters from existing posterior draws in a fashion 


that provides model parameter samples from / (parameters|y_r, M^) (Stern and Cressie 


2000). This approach reduces the known sensitivity to outliers expressed by the LPML. 


Our primary modeling goal, however, is not “out-prediction”, beyond the data, but “in¬ 
prediction” at a resolution lower than the observed data. We, nevertheless, see that 
the predictor-assisted clustering model doesn’t provide a notably better mean deviance, 
D, than the simpler model to justify the added complexity. The similar £t statistics, 
combined with the lower perturbation in the estimated functions illustrated in Figure 


incline us to prefer the simpler model of Section 2.3 
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Figure 8: Comparison of model-estimated values for a 1— year county (Craven County, 
NC) when excluding 1— year data values. The top plot panel provides results for the 
predictor-assisted clustering model (which we label, (Y,X)), while the plot in the bottom 
panel excludes predictors in the prior for cluster assignments (which we label, (FIX)). 
The solid, pink line in each plot panel presents the posterior mean fitted function when 
excluding the 1— year data points, while the dashed, blue line presents the posterior 
mean when including the 1— data points. The gray shading represents the 95% credible 
intervals as estimated on the models excluding 1— year data points. The associated 
pseudo statistics are also estimated from the models excluding 1— year data points. 
The solid pink diamonds plot the 1— year data points. 

5 Simulation Study 

Our examination of results for the ACS helped provide insight on the £t performance, 
but perhaps does not fully address the quality of fit for counties with only 3— and 5— 
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Table 2: Fit performance comparison between model including predictors in prior for 
cluster assignments,(F, X) ,and model excluding predictors in clustering, (Y\X). Lower 
values indicate better fit performance for all included fit statistics. 



iY,X) 

(F|X) 

-LPML 

233517 

228181 

DICs 

449663 

450199 

D 

444634 

446928 


year data values. To address quality of fit for these counties, we generate synthetic val¬ 
ues for coefficients, (B^), from Equation]^ employing the posterior means of 
parameters ( Ay^i, ki j from the model of Section 


2.3 


covariance 
We next compute fej = 

The 


where is observed (known). We next generate yi,q ■^{Eieb 
same nesting relationships of (county, year) to (block, period) from the ACS are du¬ 
plicated for the simulation study, so that we are generating a synthetic version of ACS 
employment counts. Of course, this simulation assumes that our spatial and temporal 
nesting construction is the correct generating model, which we do not know to the case, 
though the fit performances on 1— year counties when excluding the 1— year data values 
suggests that this assumption may be broadly reasonable. Figure [^presents the pseudo 
statistics, htted function (denoted by a pink line) and associated 95% credible interval 
(denoted by gray shading), along with the true function (denoted by the dashed, blue 
line) for a 3— year county. It reveals that our model also does well on a county for which 
we have 3— year period statistics, but not 1— year period statistics. 

Similarly to the 3— year county result, Figure pT] presents typical results for a county 
with only a single, 5— year statistic available in the case where that county is nested in 
a block relatively near to it in size. As earlier mentioned, this situation is typical for 
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Figure 9: Fitted versus data values for simulated 3— year county. 


MCD’s, which by construction (in New England) are nested within counties. While we 
see that the htted result expresses more smoothness than the truth, it does generally 
follow local features in the true trend and the credible interval is wider than those for 
counties with published 3— year period statistics. 


Figure 11 presents estimated results for a county with only a single 5— year period 


observed statistic and that is nested in a block far different (much larger) in size. The 
true trend is similar to that in Figure and we see that the htted function expresses a 
greater degree of over-smoothing and is unable to capture local features in time, though 
the overall true trend and magnitude are still captured. Adding data for upcoming 
years will bring in additional 5— year period statistics, which are expected to improve 
the quality of estimation for these far-nested counties by borrowing strength over periods, 
rather than blocks. 
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Figure 10: Fitted versus Data values for simulated 5— year county linked to one or more 
blocks of similar size. 
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Figure 11: Fitted versus data values for simulated 5— year county linked only to blocks 
much larger in size. 
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6 Discussion 


Motivated by the use of ACS employment data at the BLS to allocate statewide CPS 
employment estimates to sub-state, local areas, we have developed a general approach 
to estimate fine-scale time and areal-indexed parameters using an ensemble of coarse- 
scale observations that spatially and temporally nest the parameters. We specify the 
likelihood to link subsets of the parameters that exhaustively nest each block-period 


observation. Our best-performing Bayesian multiscale model of Section 2^ formulates a 
relatively simple nonparametric mixture model for estimating the latent county functions 
in a fashion that facilitates the shrinking together of similar functions by the data. 
The flexible shrinking under the Bayesian non-parametric approach, which penalizes 
complexity, combined with leveraging nesting relationships to identify an ensemble of 
observations that provide information about each latent parameter, provides a broadly 
useful approach. 

Many ACS users, such as the LAUS program in BLS, would prefer to employ 1 — 
year period statistics for counties, but are relegated to using 5— year period published 
statistics in the case where analyses are conducted across all counties in the U.S. Results 
from our simulation study demonstrate that our approach performs well to uncover the 
latent true county-year parameters for 3— year counties and 5— year counties, where 
the 5— year counties nest within similarly-sized blocks (along with few other counties). 
There was some notable over-smoothing of the estimated county function (though the 
magnitude and global trend are captured) for 5— year counties exclusively nested in 
much larger-sized blocks, which occurs because we only have a single, 5— year period 
statistic for these counties. We expect improvements in the £t accuracy for these counties 
as we add upcoming years to the five years of data that we considered for our analysis 
because our mixtures of Gaussian process formulations borrows strength across years. 
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Employing an ensemble of statistics published at varied resolutions even adds value for 
the estimation of counties with 1— year period statistics by incorporating the additional 
statistics associated to blocks nesting each 1— year county. Our approach may be 
applied to any variable from the ACS, as well as to other data sets that express this 
multiresolution structure. 
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